Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert apache2.access to ECS #8901

Closed
wants to merge 1 commit into from

Conversation

ruflin
Copy link
Member

@ruflin ruflin commented Nov 2, 2018

Todo:

  • Update docs
  • Update fields documentation
  • Add aliases for old fields
  • To PR for source in Filebeat
  • CHangelog
  • Docs

@ruflin ruflin added in progress Pull request is currently in progress. module Filebeat Filebeat labels Nov 2, 2018
@@ -4,8 +4,8 @@
"grok": {
"field": "message",
"patterns":[
"%{IPORHOST:apache2.access.remote_ip} - %{DATA:apache2.access.user_name} \\[%{HTTPDATE:apache2.access.time}\\] \"%{WORD:apache2.access.method} %{DATA:apache2.access.url} HTTP/%{NUMBER:apache2.access.http_version}\" %{NUMBER:apache2.access.response_code} (?:%{NUMBER:apache2.access.body_sent.bytes}|-)( \"%{DATA:apache2.access.referrer}\")?( \"%{DATA:apache2.access.agent}\")?",
"%{IPORHOST:apache2.access.remote_ip} - %{DATA:apache2.access.user_name} \\[%{HTTPDATE:apache2.access.time}\\] \"-\" %{NUMBER:apache2.access.response_code} -"
"%{IPORHOST:source.hostname} - %{DATA:user.name} \\[%{HTTPDATE:apache2.access.time}\\] \"%{WORD:http.request.method} %{DATA:url.original} HTTP/%{NUMBER:http.version}\" %{NUMBER:http.response.status_code} (?:%{NUMBER:http.response.body_sent.bytes}|-)( \"%{DATA:http.request.referrer}\")?( \"%{DATA:apache2.access.agent}\")?",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@webmat The first IPORHOST is interesting. Looking at the grok pattern it can be an IP or HOSTNAME (the grok definition of HOSTNAME, no port): https://github.com/elastic/elasticsearch/blob/6.4/libs/grok/src/main/resources/patterns/grok-patterns#L31 So far our test logs seem to contain always IP address here. Could this be different?

If it is a hostname, we would also have an issue further down with running geoip on it I assume. This means probably the pattern here should be modified to just be IP?

Copy link
Contributor

@webmat webmat Nov 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

%h: Remote hostname. Will log the IP address if HostnameLookups is set to Off, which is the default. If it logs the hostname for only a few hosts, you probably have access control directives mentioning them by name. See the Require host documentation.

You're right that this can be a hostname. Although I would err on the side of not modifying how the module previously worked, otherwise this is no longer an ECS conversion of a module, it becomes an overhaul of the module. And we're talking about doing them all in the next few days...

So in this case, the module probably has problems when people have hostnames there. I would argue this ECS conversion should is unrelated to fixing this problem.

I have no problem if you open issues for problems with this module, though :-)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I just got into a rabbit hole based on this.

So yes, source can be a host (prior to determining if you have an IP literal or a hostname) there. I also saw that apache2.access.remote_ip was previously stored as keyword, which would have supported this. So unless we decide to change this field name (elastic/ecs#166), source.hostname is indeed where you should store this.

Now if this module was doing GeoIP on this field, I guess I would go the extra step and conditionally duplicate this IP to source.ip (after confirming this is an IP), and doing GeoIP on that field instead. WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apache2.access.agent: why not store directly in user_agent.original?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point about apache2.access.agent, changed.

"fileset.module": "apache2",
"fileset.name": "access",
"http.request.method": "GET",
"http.response.body_sent.bytes": "209",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@webmat I suggest to put this under http here even though it's not in ECS

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few things:

  • It's true that people should be able to nest things freely under ECS fields. However, us doing so in these first conversions may confuse people, no?
  • Plus, we'd be introducing a breaking change for a field that will change again in the future, once we settle on a final ECS name.
    • The way this field is named is not an easy sell like "let's just add it as is to ECS", for me. Because with web servers you can track at least 6 different sizes: request header, body or total size. Same thing for response. So here I would stick with the previous field name under apache.access until we've added this to ECS.

@ruflin
Copy link
Member Author

ruflin commented Nov 2, 2018

@webmat This is not complete yet but worth starting a dicussion on it.

Copy link
Contributor

@webmat webmat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great exercise. There's a lot of things to discuss here :-)

filebeat/_meta/fields.common.yml Outdated Show resolved Hide resolved
filebeat/_meta/fields.common.yml Outdated Show resolved Hide resolved
"fileset.module": "apache2",
"fileset.name": "access",
"http.request.method": "GET",
"http.response.body_sent.bytes": "209",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few things:

  • It's true that people should be able to nest things freely under ECS fields. However, us doing so in these first conversions may confuse people, no?
  • Plus, we'd be introducing a breaking change for a field that will change again in the future, once we settle on a final ECS name.
    • The way this field is named is not an easy sell like "let's just add it as is to ECS", for me. Because with web servers you can track at least 6 different sizes: request header, body or total size. Same thing for response. So here I would stick with the previous field name under apache.access until we've added this to ECS.

"offset": 73,
"prospector.type": "log"
"prospector.type": "log",
"source.hostname": "::1",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The source of this HTTP request is the remote.

@@ -4,8 +4,8 @@
"grok": {
"field": "message",
"patterns":[
"%{IPORHOST:apache2.access.remote_ip} - %{DATA:apache2.access.user_name} \\[%{HTTPDATE:apache2.access.time}\\] \"%{WORD:apache2.access.method} %{DATA:apache2.access.url} HTTP/%{NUMBER:apache2.access.http_version}\" %{NUMBER:apache2.access.response_code} (?:%{NUMBER:apache2.access.body_sent.bytes}|-)( \"%{DATA:apache2.access.referrer}\")?( \"%{DATA:apache2.access.agent}\")?",
"%{IPORHOST:apache2.access.remote_ip} - %{DATA:apache2.access.user_name} \\[%{HTTPDATE:apache2.access.time}\\] \"-\" %{NUMBER:apache2.access.response_code} -"
"%{IPORHOST:source.hostname} - %{DATA:user.name} \\[%{HTTPDATE:apache2.access.time}\\] \"%{WORD:http.request.method} %{DATA:url.original} HTTP/%{NUMBER:http.version}\" %{NUMBER:http.response.status_code} (?:%{NUMBER:http.response.body_sent.bytes}|-)( \"%{DATA:http.request.referrer}\")?( \"%{DATA:apache2.access.agent}\")?",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I just got into a rabbit hole based on this.

So yes, source can be a host (prior to determining if you have an IP literal or a hostname) there. I also saw that apache2.access.remote_ip was previously stored as keyword, which would have supported this. So unless we decide to change this field name (elastic/ecs#166), source.hostname is indeed where you should store this.

Now if this module was doing GeoIP on this field, I guess I would go the extra step and conditionally duplicate this IP to source.ip (after confirming this is an IP), and doing GeoIP on that field instead. WDYT?

@@ -4,8 +4,8 @@
"grok": {
"field": "message",
"patterns":[
"%{IPORHOST:apache2.access.remote_ip} - %{DATA:apache2.access.user_name} \\[%{HTTPDATE:apache2.access.time}\\] \"%{WORD:apache2.access.method} %{DATA:apache2.access.url} HTTP/%{NUMBER:apache2.access.http_version}\" %{NUMBER:apache2.access.response_code} (?:%{NUMBER:apache2.access.body_sent.bytes}|-)( \"%{DATA:apache2.access.referrer}\")?( \"%{DATA:apache2.access.agent}\")?",
"%{IPORHOST:apache2.access.remote_ip} - %{DATA:apache2.access.user_name} \\[%{HTTPDATE:apache2.access.time}\\] \"-\" %{NUMBER:apache2.access.response_code} -"
"%{IPORHOST:source.hostname} - %{DATA:user.name} \\[%{HTTPDATE:apache2.access.time}\\] \"%{WORD:http.request.method} %{DATA:url.original} HTTP/%{NUMBER:http.version}\" %{NUMBER:http.response.status_code} (?:%{NUMBER:http.response.body_sent.bytes}|-)( \"%{DATA:http.request.referrer}\")?( \"%{DATA:apache2.access.agent}\")?",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apache2.access.agent: why not store directly in user_agent.original?

dev-tools/ecs-migration.yml Outdated Show resolved Hide resolved
* Update ecs-migration.yml file
* Update changelog
* Update generated files
* Link old fields

Todo:
* Add aliases for old fields
@ruflin ruflin changed the title [WIP] Convert apache2.access to ECS Convert apache2.access to ECS Nov 22, 2018
@ruflin ruflin added the review label Nov 22, 2018
Copy link
Contributor

@webmat webmat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gave this a careful look, and there's a few things to revisit (more details inline):

  • Rebase needed
  • Can't alias apache2.access.remote_ip (ecs-migration.yml)
  • Create aliases in the module's field definitions
  • HTTP metrics comment (not a deal breaker)
  • source.hostname not in ECS
  • referer typo
  • user_agent parsing currently broken

"rename": {
"field": "apache2.access.agent",
"target_field": "apache2.access.user_agent.original",
"target_field": "user_agent",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your unparsed user_agent is no longer in the apache2.access.agent, so currently the UA parser skips for all of your events. If you look at your JSON expected log results, you'll see they no longer include the parsed user agent :-)

However as I've discovered, you can't start by setting user_agent.original, and then have the UA parser output to user_agent. It won't merge its results around user_agent.original, rather it will overwrite everything there. So you'll have your parsed UA but no longer the original.

The process has to be

  1. Extract original UA to temporary field
  2. Perform UA parsing with default target_field (default is already user_agent)
  3. Move original UA from temporary field to user_agent.original

# Filebeat modules

## Suricata module

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll have to rebase from master, to get the Suricata source_ecs replaced

## Apache

- from: apache2.access.remote_ip
to: source.ip
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This alias cannot stay there, since the semantics are different.

In the grok you split this field towards two possible fields. In most cases the ip field will be the one that gets populated, but if HostnameLookups is set to On, part of your events will only populate source.domain.

I think there's value in keeping the ambiguous field around (and therefore not making it an alias), and doing the split towards IP and domain anyway.

@@ -4,8 +4,26 @@
Apache2 Module
short_config: true
fields:
- name: http.response.body_sent.bytes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to include the HTTP size metrics in ECS. But I don't think they'll look like that. The metrics available are typically request header size, request body size, response header size, response body size. And for both request and response, one could also be interested in having a total size. I'm not saying all of those need to make it to ECS necessarily. But given all of these options, I doubt the response.body_sent.bytes will stick around as is.

I say we should hash that out first in ECS, and revisit all affected modules afterwards. Personally I'd leave as previous for now.

But not a big deal either way. Once we've figured this out in ECS, we'll revisit here anyway :-)

description: >
The number of bytes of the server response body.

- name: source.hostname
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

source.hostname is no longer in ECS.

description: >
test

- name: http.request.referer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ECS doesn't reproduce the typo ;-) It should be http.request.referrer

- from: apache2.access.geoip.*
to: source.geoip.*
alias: false
copy_to: false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you get around to actually creating the aliases in the module's field defs, you can check out how I did it for the other modules (example). The definitions are so regular that with judicious find & replace, you can get them adjusted to this module very quickly.

@ruflin ruflin mentioned this pull request Nov 23, 2018
@webmat webmat mentioned this pull request Nov 27, 2018
14 tasks
@ruflin
Copy link
Member Author

ruflin commented Nov 27, 2018

Closing in favor of #9245

@ruflin ruflin closed this Nov 27, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Filebeat Filebeat in progress Pull request is currently in progress. module review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants